13 research outputs found
Visualization Tools for Comparative Genomics applied to Convergent Evolution in Ash Trees
Assembly and analysis of whole genomes is now a routine part of genetic
research, but effective tools for the visualization of whole genomes and their
alignments are few. Here we present two approaches to allow such visualizations
to be done in an efficient and user-friendly manner. These allow researchers to
spot problems and patterns in their data and present them effectively.
First, FluentDNA is developed to tackle single full genome visualization and
assembly tasks by representing nucleotides as colored pixels in a zooming
interface. This enables users to identify features without relying on algorithmic
annotation. FluentDNA also supports visualizing pairwise alignments of wellassembled whole genomes from chromosome to nucleotide resolution.
Second, Pantograph is developed to tackle the problem of visualizing variation
among large numbers of whole genome sequences. This uses a graph genome
approach, which addresses many of the technical challenges of whole genome
multiple sequence alignments by representing aligned sequences as nodes which
can be shared by many individuals. Pantograph is capable of scaling to thousands
of individuals and is applied to SARS and A. thaliana pangenomes.
Alongside the development of these new genomics tools, comparative genomic
research was undertaken on worldwide species of ash trees. I assembled 13 ash
genomes and used FluentDNA to quality check the results and discovered
contaminants and a mitochondrial integration. I annotated protein coding genes
in 28 ash assemblies and aligned their gene families. Using phylogenetic analysis,
I identified gene duplications that likely occurred in an ancient whole genome
duplication shared by all ash species. I examined the fate of these duplicated
genes, showing that losses are concentrated in a subset of gene families more
often than predicted by a null model simulation. I conclude that convergent
evolution has occurred in the loss and retention of duplicated genes in different
ash species.BBSRC BB/S004661/
Skittle: A 2-Dimensional Genome Visualization Tool
<p>Abstract</p> <p>Background</p> <p>It is increasingly evident that there are multiple and overlapping patterns within the genome, and that these patterns contain different types of information - regarding both genome function and genome history. In order to discover additional genomic patterns which may have biological significance, novel strategies are required. To partially address this need, we introduce a new data visualization tool entitled Skittle.</p> <p>Results</p> <p>This program first creates a 2-dimensional nucleotide display by assigning four colors to the four nucleotides, and then text-wraps to a user adjustable width. This nucleotide display is accompanied by a "repeat map" which comprehensively displays all local repeating units, based upon analysis of all possible local alignments. Skittle includes a smooth-zooming interface which allows the user to analyze genomic patterns at any scale.</p> <p>Skittle is especially useful in identifying and analyzing tandem repeats, including repeats not normally detectable by other methods. However, Skittle is also more generally useful for analysis of any genomic data, allowing users to correlate published annotations and observable visual patterns, and allowing for sequence and construct quality control.</p> <p>Conclusions</p> <p>Preliminary observations using Skittle reveal intriguing genomic patterns not otherwise obvious, including structured variations inside tandem repeats. The striking visual patterns revealed by Skittle appear to be useful for hypothesis development, and have already led the authors to theorize that imperfect tandem repeats could act as information carriers, and may form tertiary structures within the interphase nucleus.</p
The khmer software package: enabling efficient nucleotide sequence analysis [version 1; referees: 2 approved, 1 approved with reservations]
The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/
The khmer software package: enabling efficient nucleotide sequence analysis
The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/
Recommended from our members
Pangenome Graphs.
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future
A high‐quality reference genome for Fraxinus pennsylvanica for ash species restoration and research.
Green ash (Fraxinus pennsylvanica) is the most widely distributed ash tree in North America. Once common, it has experienced high mortality from the non‐native invasive emerald ash borer (EAB; Agrilus planipennis). A small percentage of native green ash trees that remain healthy in long‐infested areas, termed “lingering ash,” display partial resistance to the insect, indicating that breeding and propagating populations with higher resistance to EAB may be possible. To assist in ash breeding, ecology and evolution studies, we report the first chromosome‐level assembly from the genus Fraxinus for F. pennsylvanica with over 99% of bases anchored to 23 haploid chromosomes, spanning 757 Mb in total, composed of 49.43% repetitive DNA, and containing 35,470 high‐confidence gene models assigned to 22,976 Asterid orthogroups. We also present results of range‐wide genetic variation studies, the identification of candidate genes for important traits including potential EAB‐resistance genes, and an investigation of comparative genome organization among Asterids based on this reference genome platform. Residual duplicated regions within the genome probably resulting from a recent whole genome duplication event in Oleaceae were visualized in relation to wild olive (Olea europaea var. sylvestris). We used our F. pennsylvanica chromosome assembly to construct reference‐guided assemblies of 27 previously sequenced Fraxinus taxa, including F. excelsior. Thus, we present a significant step forward in genomic resources for research and protection of Fraxinus species
The khmer software package: enabling efficient nucleotide sequence analysis [version 1; referees: 2 approved, 1 approved with reservations]
The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/